Simple algorithm for judging equivalence of differential-algebraic equation systems

Mathematical formulas play a prominent role in science, technology, engineering, and mathematics (STEM) documents; understanding STEM documents usually requires knowing the difference between equation groups containing multiple equations. When two equation groups can be transformed into the same form, we call the equation groups equivalent. Existing tools cannot judge the equivalence of two equation groups; thus, we develop an algorithm to judge such an equivalence using a computer algebra system. The proposed algorithm first eliminates variables appearing only in either equation group. It then checks the equivalence of the equations one by one: the equations with identical algebraic solutions for the same variable are judged equivalent. If each equation in one equation group is equivalent to an equation in the other, the equation groups are judged equivalent; otherwise, non-equivalent. We generated 50 pairs of equation groups for evaluation. The proposed method accurately judged the equivalence of all pairs. This method is expected to facilitate comprehension of a large amount of mathematical information in STEM documents. Furthermore, this is a necessary step for machines to understand equations, including process models.

The volume of scientific literature has been increasing exponentially, and this trend continues with an average doubling period of 15 years 1 and is expected to continue. When writing a report on a particular topic, such as a review of previous studies, it is necessary to survey the increasing amount of literature. The key to understanding multiple documents and organizing the information is to recognize the difference among the documents, which requires much toil. Automatically judging the equivalence of the information would be helpful for efficiently processing a large number of documents.
Equations representing the relationships between variables play a central role in understanding documents of science, technology, engineering, and mathematics (STEM). Multiple equations are often used as a single entity to describe the relationship between variables. It is, therefore, crucial to recognize the difference between equation groups consisting of two or more equations when understanding STEM documents. A physical model is a typical example of an equation group. For example, suppose a researcher wants to build a physical model. Before building the model, the researcher surveys previous studies and identifies the differences among the multiple models in the previous studies, which is an arduous task. Several studies have dealt with a text in the chemical engineering field using natural language processing techniques [2][3][4] , but no studies have aimed to reduce this kind of effort.
Since a variable is sometimes expressed by different symbols among documents, we have to extract variable definitions [5][6][7] and unify the variable symbols' representations 8 before judging the equivalence of equation groups. Such a method can be developed independently of the equivalence judgment; thus, we assume that different symbols do not represent the same variable in this study.
In order to judge the equivalence of equation groups, computers must grasp the meanings of mathematical formulas. Converting natural language into vectors is one of the methods for computers to handle the meanings of natural language, and recent studies utilize neural network models, such as Word2Vec 9 , Transformer 10 , and Bidirectional Encoder Representations from Transformers (BERT) 11 . Similarly, several studies represent mathematical formulas with neural network models [12][13][14] . Mansouri et al. 12 defined a similarity between two formulas based on their appearances, but similar-looking formulas do not necessarily perform the same calculation. For example, the similarity between a + b = 0 and a − b = 0 is higher than that between a + b = 0 and a = −b www.nature.com/scientificreports/ based on the models by Mansouri et al. 12 . The existing neural network models, which focus on the appearance of formulas, do not work for the equivalence judgment of equation groups. Another approach for handling the meanings of mathematical formulas in computers is encoding the formulas with special markup languages such as Content Mathematical Markup Language (MathML) 15 and OMDoc 16 . However, such markup is rarely used to publish mathematical knowledge 17 . The commonly used methods for notating mathematical expressions are LaTeX for papers and Presentation MathML 15 for the Web.
The most effective way for computers to comprehend formulas' meanings is to use computer algebra systems (CASs). Formula transformations, for example, from LaTeX to the format usable in CASs, are well-studied in the CASs literature 18,19 . Further, CASs, such as Mathematica and Maple, have LaTeX input support.
CASs can solve equations and judge the equivalence of two equations by comparing their solutions for one variable. Similarly, if two equation groups are solvable for one variable, CASs can judge their equivalence by comparing their solutions. However, when one of two equation groups to be compared is not solvable for one variable, CASs alone cannot correctly judge their equivalence. Physical models are commonly represented by combinations of differential equations and algebraic equations, called differential-algebraic equation (DAE) systems. Therefore, it is essential to address this issue for machines to compare equation groups contained in documents related to chemical engineering.
In this study, we propose a method for solving this problem. The proposed method uses a computer algebra system to eliminate variables contained only in either equation group and judge whether each equation in one equation group is equivalent to an equation in the other. We generate 50 equivalent and non-equivalent pairs of equation groups and evaluate the performance of our proposed method.

Methods
Equivalence judgment methods. Figure  Our algorithm utilizes a CAS to (1) solve equations, (2) substitute formulas into variables to eliminate the variables, and (3) judge whether two formulas are equivalent.
In this study, we assume that two equivalent equations are solvable for any variable, and each number of solutions for the variable is one, that is, the solution for the variable is unique.
Equivalence judgment of equations. Two equivalent equations have to satisfy the following requirements: • The equations have the same set of variables.
• The solutions of the equations for any variable are the same. To eliminate a variable v in an equation group E, the algorithm first obtains the equations including v in E, E v (Line 1). If the ith equation in E v , e i , is solvable for v and has only one solution for v, the solution of e i for v, s, is computed (Lines 3-5). Then, s is substituted into v in all equations in E v except e i , and the set of the substituted equations E ′ v is obtained (Lines 6 and 7). Finally, the set of equations E * that does not include v is derived by Equivalence judgment of equation groups. We judge not only the equivalence between two equations but also the equivalence between two equation groups consisting of multiple equations. Equivalent two equation groups after variable elimination need to satisfy the following two conditions: Based on these conditions, we propose Algorithm 3 for equivalence judgment of two equation groups E A and E B , whose sets of variables are V E A and V E B , respectively.
At first, the algorithm compares V E A and V E B to transform them into the same (Lines 1-19). Here, we define between-shared variables V bs as the variables shared between E A and E B , and within-shared variables V ws,E i as the variables shared between the equations within an equation group E i (Lines 1-3). V E A and V E B are transformed into the same by eliminating the variables appearing only in either equation group. Besides, variables that can be eliminated in an equation group are included in its within-shared variables. Hence, the variables to be eliminated in E A and E B , which are denoted by V * E A and V * E B , are the set difference of V ws,E A and V bs and that of V ws,E B and V bs , respectively (Lines 4 and 5). When the variables appearing only in either equation group are not equal to the variables to be eliminated, V E A and V E B cannot be transformed into the same. In such a case, E A and E B are judged non-equivalent (Lines 6 and 7). Otherwise, the variables in V * E A and V * E B are eliminated one by one until the two equation groups share the same set of variables (Lines 8-18).
After the variable elimination, the algorithm checks whether each equation in E A is equivalent to an equation in E B and each number of the equations is the same. If all the equations in E A and E B have a one-to-one relationship, the two equation groups are judged to be equivalent; otherwise, non-equivalent (Lines 20-24). Figure 2 shows an example of two equivalent equation groups and their within-group shared variables and between-group shared variables. Our proposed algorithm eliminates the variable k to transform the sets of the variables in these equation groups into the same, checks whether each equation in E A is equivalent to an equation in E B , and judges they are equivalent. We implemented our proposed algorithm in Python using a Python-based CAS, Sympy 23 . We prepared TeXformatted equation groups and parsed them using the 'parse_latex' function. Although the 'parse_latex' function is incomplete, we confirmed that all equations were converted correctly to Sympy expressions.

Results and discussion
The proposed method correctly judged the equivalence of all 50 pairs. This section describes how the proposed algorithm realized the correct judgment in each case.

Cases of equation equivalence judgment.
In Case 1, the sets of variables of the two equations were different; thus, Algorithm 1 returned false (Lines 1 and 2 www.nature.com/scientificreports/ The two equations in Case 2 had the same set of variables and the same solutions for one variable, for example, w 1 ; thereby, Algorithm 1 returned true (Lines 9 and 10).
In Case 3 where the two equations had the same set of variables but the solutions for one variable were different, Algorithm 1 returned false (Lines 11 and 12).
Algorithm 1 fails to accurately judge the equivalence of equations in the following two cases: 1) when the number of the solutions for one variable is more than one and 2) when either equation cannot be solved. The first case occurs when all variables in an equation appear in a second or higher-order form. Since such equations have been rarely seen in describing physical models, they would not be a problem in practice. The second case appears when an equation consists only of partial derivatives of variables, where it is impossible to solve for a single variable without information other than the equation. Developing a method to deal with such cases is a subject for future work.

Cases of equation group equivalence judgment. In Case 4, the sets of variables of the two equation
groups V A and V B were different. Algorithm 3 first derived the between-shared variables and within-shared variables of the two equation groups as follows (Lines 1-3): Then, the variable appearing only in E B , V * E B = {k} was obtained (Line 5), and k in E B was eliminated by substituting k = k 0 exp (−E/(RT)) into −r A = kC A (Lines 13-15). After this variable elimination, V A and V B became the same, and all equations in E A were equivalent to those in E B ; thus, Algorithm 3 returned true (Lines 20 and 21).
In Case 5, E A and E B had different sets of variables, and variable elimination was required in both equation groups. As the same method in Case 4, Algorithm 3 eliminated the variables in two equation groups (Lines 9-18) and returned true (Lines 20 and 21).
(1) V bs = {r A , k 0 , E, R, T, C A , V , t, q, C 0 , ρ, C, w, T i , H r , U, T c }, Figure 2. An example two equivalent equation groups E A and E B . V ws,E A and V ws,E B are the within-shared variables of E A and E B , and V bs is the between-shared variables of the two equation groups. Table 1. Pairs of equations used in experiments.

Case Equation Equivalent
1 Algorithm 3 sometimes fails to precisely judge the equivalence when an equation group does not lead to a single form after variable elimination. Assuming that we have an equation group as follows and k 1 needs to be eliminated.
The number of the solutions for k 1 is three, and the equation groups after eliminating k 1 depend on the solution used for the substitution. Although such examples will need to be addressed in the future, the proposed method is useful for judging the equivalence of many types of physical models as shown in Supplementary Information. (Table S1) (4)

Conclusion
We proposed a simple rule-based method for equation group equivalence judgment. The proposed method eliminates variables that appear only in either equation group and checks whether all equations in the two equation groups have a one-to-one relationship. The method was implemented in Python and a Python-based CAS, Sympy, and 50 equivalent and non-equivalent pairs of equations and equation groups were used for experiments. The results have shown that the proposed method can accurately judge whether two equation groups are equivalent. The proposed method still has some limitations. The method has two assumptions to be removed: (1) any equation is solvable for one variable, and (2) the number of solutions for each equation is one. Furthermore, the equivalence judgment algorithm for equation groups (Algorithm 3) cannot correctly judge the equivalence of some equation groups that do not lead to a single form after variable elimination as explained in Section 3.2. Our future work will address these limitations by extending the method in this paper. Furthermore, we plan to expand the scope to more types of calculations, such as summation symbol , integral symbol , vectors, and matrices.

Data availability
The datasets used in the current study are available at https:// github. com/ human sys-lab/ dae-equiv-judge.